Skip to content

Fix serve quadlet: SELinux label and ready-signal healthcheck#18

Merged
jdoss merged 1 commit intomasterfrom
fix/serve-quadlet-selinux-healthcheck
Apr 8, 2026
Merged

Fix serve quadlet: SELinux label and ready-signal healthcheck#18
jdoss merged 1 commit intomasterfrom
fix/serve-quadlet-selinux-healthcheck

Conversation

@jdoss
Copy link
Copy Markdown
Contributor

@jdoss jdoss commented Apr 8, 2026

Summary

Two regressions in generate_container_serve_quadlet caught on the first real deployment of the generator output:

  1. Container crashed on startup with Config file not found: /etc/psi/config.yaml because the quadlet did not set SecurityLabelType. Without container_runtime_t the container runs as container_t and cannot read /etc/psi (labeled etc_t) via a :ro mount. Using :Z would relabel the host config dir, which we never want.
  2. Even after fixing Pluggable provider architecture with Nitrokey HSM support #1 the unit would have sat in activating forever, because quadlet emits Type=notify by default and expects sd_notify(READY=1). psi serve does not call sd_notify itself.

Why

First real use of psi systemd install --mode container on the test server exposed both gaps. generate_container_provider_setup_quadlet already sets SecurityLabelType=container_runtime_t; the serve generator was inconsistent. The Butane-managed quadlet that had been running previously had both SecurityLabelType and Notify=healthy + HealthCmd; the generator dropped them.

What changes

psi/unitgen.py

  • SecurityLabelType=container_runtime_t
  • Notify=healthy
  • HealthCmd=curl -sf --unix-socket <sock> http://localhost/healthz
  • HealthInterval=30s, HealthRetries=10, HealthStartPeriod=60s, HealthTimeout=5s

HealthStartPeriod=60s gives HSM login plus encrypted cache decrypt enough headroom before the first probe — Nitrokey HSM startup alone can take 25 seconds.

tests/test_unitgen.py

  • test_serve_quadlet_has_security_label_type — asserts the label is emitted
  • test_serve_quadlet_has_notify_healthy — asserts Notify=healthy and a HealthCmd pointing at /healthz with a start period long enough for HSM startup

Test plan

  • uv run ruff check psi/ tests/ — clean
  • uv run ruff format --check psi/ tests/ — clean
  • uv run ty check — clean
  • uv run pytest -q — 296 passed (2 new)
  • On the test server: pull the new image, regenerate quadlets, daemon-reload, restart psi-secrets.service, confirm the unit goes active within HealthStartPeriod of starting and that podman exec psi-secrets psi cache status works

Two regressions in generate_container_serve_quadlet caught when the
generator produced its first real deployment on the test server.

First: the container crashed on startup with
'Config file not found: /etc/psi/config.yaml', because the generated
quadlet did not set SecurityLabelType. Without container_runtime_t,
the container runs under the default container_t SELinux type, which
cannot read /etc/psi (labeled etc_t) via a :ro bind mount — :Z would
relabel the host dir, which we never want on shared config. Setting
SecurityLabelType=container_runtime_t is the standard workaround and
matches what generate_container_provider_setup_quadlet already does.

Second: quadlet emits Type=notify for .container units by default and
expects podman to send sd_notify(READY=1). psi serve does not call
sd_notify itself, so the unit used to sit in 'activating' until
systemd's TimeoutStartSec killed it. Notify=healthy plus a HealthCmd
that curls the /healthz endpoint through the unix socket makes podman
fire the ready signal once the first healthcheck passes.
HealthStartPeriod=60s gives HSM login + cache decrypt enough headroom
before the first probe.
@jdoss jdoss merged commit d675544 into master Apr 8, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant